The COVID-19 pandemic is an event with far-reaching stressful consequences.
People tend to use more musical media during nationwide lockdowns (Fink et al., 2021) to cope with this crisis situation.
When considering people’s active use of musical media, it seems, in this vein, reasonable to examine the characteristics of the music that people have listened to.
Spotify defines the so-called DACH region (Germany, Austria, and Switzerland) as one target audience, which is why we decided to examine the daily top 200 charts of these countries.
RQ: To what extent can we estimate and classify the listening behavior via proxy variables during the pandemic and a comparable reference period based on Spotify’s provided audio features for each track by taking the mood-related audio features particularly into account?
Web-scraping: Daily top 200 chart positions, song titles, their stream counts, and track IDs between March 10th, 2020, and June 14th, 2020, and for the same period in 2019 (\(N_{Total} = 115198\)) from Spotify’s website for the DACH countries (Germany, Austria, Switzerland)
By using arguments of Hadley Wickham’s R package rvest for scraping websites within a simple function and by using his tidyverse):
Audio features were collected by using Spotify’s API and the R package “spotifyr” (Thompson et al., 2021) on the previously retrieved song IDs.
Figure 1
Comparison of Stream Counts of Daily Top 200 Spotify Charts Before and During the Pandemic Per Country and Across all DACH Countries.
Note. A log-scaling with base 10 was applied to the x-axis for visual purposes, mainly, to avoid a heavy tail of the higher stream counts.
Accounting H1: Dimensionality reduction of the mood-related audio features by using a clustering approach.
Goal:
More interpretable and summarized mood-related audio features.
Testing whether people streamed more of a certain mood cluster during the pandemic.
Implementing mood clusters as input variables into our classification model.
Table 1
K-means Cluster Solution on Min-Max-Normalized and Rescaled Mood-Related Audio Features.
| Cluster | n | Mode | Danceability | Energy | Loudness (rescaled) | Valence | Tempo (rescaled) |
|---|---|---|---|---|---|---|---|
| 1 | 26990 | 1 | .672 | .572 | .738 | .335 | .120 |
| 2 | 32626 | 0 | .754 | .716 | .812 | .678 | .121 |
| 3 | 26129 | 1 | .738 | .722 | .823 | .640 | .119 |
| 4 | 29453 | 0 | .679 | .605 | .748 | .350 | .120 |
Note. BSS/TSS ratio = 83.3%. Mean values (Danceability, Engery, Loudness, Valence) greater than or equal to .5 represent, e.g., a higher (emotional) positivity (Valence) potentials according to the Arousal-Valence circumplex model. Hence, if the values in Danceablitiy, Energy, and Loudness are also greater than or equal to .5, these values represent a higher arousal potential.
| Cluster | Labels | Mode | Danceability | Energy | Loudness_resc. | Valence | Tempo_resc. |
|---|---|---|---|---|---|---|---|
| 1 | Moderate Arousal-Potential neg Emotionality major | 1 | 0.672 | 0.572 | 0.738 | 0.335 | 0.120 |
| 2 | Higher Arousal-Potential pos Emotionality minor | 0 | 0.754 | 0.716 | 0.812 | 0.678 | 0.121 |
| 3 | Higher Arousal-Potential pos Emotionality major | 1 | 0.738 | 0.722 | 0.823 | 0.640 | 0.119 |
| 4 | Moderate Arousal-Potential neg Emotionality minor | 0 | 0.679 | 0.605 | 0.748 | 0.350 | 0.120 |
Figure 3
3D Scatterplot with Ellipsoids (1st SD) of the K-Means Cluster Solution Across all DACH Countries.
| mood_clust_fct | Group1 | Group2 | n1 | n2 | z | p.adj | r |
|---|---|---|---|---|---|---|---|
| Moderate Arousal-Potential neg Emotionality major | No_Pandemic | Pandemic | 250 | 275 | 3.530 | 0.002 | 0.154 |
| Moderate Arousal-Potential neg Emotionality minor | No_Pandemic | Pandemic | 288 | 283 | 3.551 | 0.002 | 0.149 |
| Country | mood_clust_fct | Group1 | Group2 | n1 | n2 | z | p.adj | r |
|---|---|---|---|---|---|---|---|---|
| CH | Higher Arousal-Potential pos Emotionality minor | No_Pandemic | Pandemic | 79 | 63 | -2.866 | 0.042 | 0.214 |
| CH | Moderate Arousal-Potential neg Emotionality minor | No_Pandemic | Pandemic | 76 | 51 | -3.698 | 0.002 | 0.328 |
| DE | Moderate Arousal-Potential neg Emotionality minor | No_Pandemic | Pandemic | 190 | 217 | 3.747 | 0.002 | 0.185 |
Figure 4
Combined Box, Violin, and Scatter Plots of the K-Means Cluster Solution for Each DACH Country and Across all DACH Countries Against Their Median Stream Counts Before and During the Pandemic.
track_ids and countries and contain balanced classes; 50/50 ratio) Table 5
SVM Classification: Confusion Matrix
| Correctly Classified | Actual | Precision | Recall | F1 | |
|---|---|---|---|---|---|
| Observations (Song IDs) | In % | ||||
| No Pandemic | 11236 | 11487 | 97.93 | 97.81 | 97.87 |
| Pandemic | 11244 | 11482 | 97.82 | 97.92 | 97.87 |
Figure 5
Permutation-based Variable Importance (Independent Variables) for the SVM Model (Training Set).
Figure 6
Partial Dependence Plot of the Mood Clusters With Averaged Probabilities Regarding the Classification of the Pandemic Period.
Songs that belong to the cluster Higher Arousal-Potential pos Emotionality minor and Moderate Arousal-Potential neg Emotionality minor were streamed less during the pandemic in Switzerland.
In Germany, songs that belong to the cluster Moderate Arousal-Potential neg Emotionality minor were streamed more during the pandemic.
In Austria, no significant differences could be observed.
Across all DACH countries, the songs belonging to the clusters Moderate Arousal-Potential neg Emotionality (both in major and minor) have been streamed more often during the pandemic.
We have evidence to support H2 based on the outcome of our built binary SVM classifier that yielded a high overall accuracy (ACC = 97.87%, 95% CI [0.977, 0.981]).
Furthermore, we now know that the mood clusters are the most important input variables in classifying the pandemic period.
This means, each period shows a distinct profile in terms of the mood-clusters, the used audio features of the track IDs and the grouping factor of the DACH countries.
Please feel free to contact me or Nick:
KewKalustian
KewKalustian
Kework K. Kalustian
Slides created with and via the R package revealjs.